Multilingual corpora: models, methods, uses
نویسندگان
چکیده
منابع مشابه
Multilingual Corpora for Cooperation
MLCC was a corpus, acquisition project funded by the EC Telematics program.The aim was to collect a set of texts representing a substantial improvement in range, quantity and quality of corpus material available. Two sub-corpora have been defined to help meet the needs for multilingual data consisting of a comparable set of texts in six languages and a parallel set of data in 9 languages. The c...
متن کاملPseudo-Aligned Multilingual Corpora
In machine translation, document alignment refers to finding correspondences between documents which are exact translations of each other. We define pseudo-alignment as the task of finding topical—as opposed to exact—correspondences between documents in different languages. We apply semisupervised methods to pseudo-align multilingual corpora. Specifically, we construct a topicbased graph for ea...
متن کاملAutomated Alignment in Multilingual Corpora
Experiences in computing alignments at the paragraph and sentence level within a project TRANSLEARN in the European Union's "LRE" programme of research and development in language engineering are reported. About 98% of the sentences in pairs of corpora in different languages have been aligned correctly by a method that uses dynamic programming on numbers of characters per sentence. This paralle...
متن کاملMultilingual Aspects of Monolingual Corpora
If someone would collect opinions among the computational linguists what had been the most important trend in linguistics in the last decade, it is highly probable that the majority would answer that it was the massive use of large natural language corpora in many linguistic fields. The concept of collecting large amounts of written or spoken natural language data has become extremely important...
متن کاملBuilding Strong Multilingual Aligned Corpora
Recent advances have allowed algorithms that learn from aligned natural language texts to exploit aligned sentences in more than two languages. We investigate ways of combining ( N 2 ) bilingual aligned corpora together to create a multilingual aligned corpus across N languages. As a result of the combination of several corpora, our algorithms output a multilingual corpus, with each aligned tup...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Tradterm
سال: 2004
ISSN: 2317-9511,0104-639X
DOI: 10.11606/issn.2317-9511.tradterm.2004.47044